home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
Monster Media 1996 #15
/
Monster Media Number 15 (Monster Media)(July 1996).ISO
/
os2
/
pms_103.zip
/
README.UNH
< prev
next >
Wrap
Text File
|
1996-04-23
|
2KB
|
41 lines
This is an OS/2 command line utility to strip HTML codes from
files saved from the WebX or other web browsers.
UNH 2.02 HTML stripper by Don Hawkinson dwhawk@southwind.net
usage: ..\unh file1 file2 <file3>
file1 == html file
file2 == stripped text output file
file3 == URLs from html source file - optional
The command line utility UNH does not check for the
existance of the output file, and will overwrite any existing
file. UNH is HPFS aware, so any valid OS/2 file namens may be used.
Character Entity Sets or tags
The HTML specification defines Character Entity Sets or tags
to represent particular graphic characters which have special
meanings in places in the markup, or may not be part of the
character set available to the writer. UNH does not attempt
to scan for all of the possible tags, but does try to resolve
the most common tags.
This version of UNH has support for codepages 437 and 850
and if codepage 850 is in use, the 850 character set is used.
The codepages only make a difference when &xxxx; tags are
present in the file. If the correct character or an acceptable
alternate is not available, then the &xxxx; tag will be left
in the file.
Only a few of the &#nnn; tags are supported. They do not seem to
be widely used and scanning for all of them will increase the time
it takes to process an .HTML or .HTM file.
The command line utility UNH.EXE uses the same logic as PMStripper
to strip the HTML codes from files.